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INTRODUCTION: 

Summoning the "Collective Wisdom of Top Thinkers..." 



Using this phrase to characterize the conference reported in 
the following pages. Professor Mar . et Wang, convener of the con- 
ference, had chosen ^ler words with ^re. And they were justified. 

The conference brought together an outstanding array of edu- 
cational researchers, practitioners and policy-makers. Their 
challenge, posed by the National Institute of Education, was to 
identify research needs in evaluating and documenting large-scale 
school improvement efforts to serve disadvantaged populations. In 
. short: how could the best scientific and humanistic know-how 
available today assure that further public investments in compen- 
satory education would bring solid results and' insights. 

The conference was convened at the Learning Research and 
Development Center (LRDC) of the University of Pittsburgh by 
Professor Wang, Sponsor-Director of LRDC*s Follow Through Model, 
on March 12-13, 1981. Twenty-five experts, representing the 
frontiers of evaluation research in the country today, met to dis- 
cuss issues in the design of supporting research for the planning 
and development of future Follow Through Projects. 

The official conference title was "Documentation of School 
Improvement Efforts; Some Technical Issues and Futiire Research 
Agenda." But the real "%rorking" title might well have been "This 
Tine, Let's Do It Better." 



ERIC 



Co-Sponsored by the National Institute of Education (NIE) 
and the LRDC as part of the "second strand- of NIE's Follow 
Through Planning Conferences, conference activity started 

back in 1980 when the Office of Elementary and Secondary Educa- 
tion and the Office of Educational Research and Improvement 
authorized the NIE to embark on long-term research and demon- 
stration projects to try out alternative approaches to educating 
disadvantaged primary school students. NIE used 1981 to plan its 
S^Si^'i*^®^" Advice, recommendations and input were sought from a 
wide range of individuals. The Institute commissioned 44 papers 
and arranged invitational conferences in Portland, Oregon and 

concerns and in Pittsburah, 
Pennsylvania (the report of which follows) to address research 
and evaluation. Subsequently, on June 10, 1981, a Request For 
Proposal (RFP) was issued by the NIE and, after a nationwide com- 
petition, four contracts were awarded on September 30, 1981 to 
Oakland, California, Napa, California, Detroit, Michigan ard 
Cotopaxi, Colorado. 

The experts invited to the Pittsburgh conference were all 



told that: 



Since NIE is planning to embark on a 15-20 year program 
recommendations are needed for both short-term and long-ierm 
activities. The LRDC and NtE conference will mostly address 
long-term activities and methods for continually recelvlno 
recommendations. 

paper^ind ^^''^^''^Pr^® ^^^^ roRference 

n^?^Kf/?opn ^ ^'f'.^"^ ^ ""^Py NIE Planning Paper of 

October 1980. In their invitational letter dated Jlarch 2 IQfll 
they were updated on NIE«s current thinking as foli^^s: ' 

As a result of the planning conferences in Portland and 
Austin, NIE has refined Its tentative plan for Its involve- 
ment with pilot FoIlo-f-Through projects. Basically, we are 
planning to seek low-cost school-wide approaches toward edu- 
cating disadvantaged children. This Implies an emphasis on 
methods for the management of instruction rather than cur- 
riculum development. The conceived evaluation activities 
would document the implementation of each approach over a 
three to five-year period and would be designed to serve 
potential adoption/adaption sites in their efforts to 
determine whether to Implement a similar program at their 
s I te. 



A synthesis of these three conferences was published by the NIE. 
"Planning For Follow Through Research and Development" includes a 
short history of Follow Through and is available by writing to 
Charles Stalford, National Institute of Education, Stop 9, 
1200 19th Street, N.W., Washington, D.C. 20208-1101' 



From the LRCD-NIE Conference, NIE is seeking sets of 
written reconmendatlcKis wi: 

1. the evaluation of N IE-funded pilot Follow-Through 
projects 

2. needed evaluatton mettedology research 

3. needed Inst rumen tat I cm research and 
k. compensatory education research. 

Recommendations on ccxiductlng evaluations are needed for 
projects to commence during the next school year. Re- 
search recommendations are needed to provide support for 
future program and evaluation activities. 



They Came with Baggage 



The participants shared another common background over and 
above the challenge presented them by the NIE. Many of them had 
been involved with Follovr Through from its inception, and the 
others l^iad been well acquainted with its mission and its strug- 
gles. The relevance of this background to their responses to t:he 
NIE challenge was well expressed by Prof. Wang: 

Those of us who have been affiliated with Follow Through 
since lt« early years probably can recall the heated de- 
bate:: we had for years over the one central quest Icmi: tow 
do t.'3 go about Identifying the best models of early child- 
hood education? This dispute prevailed during Follow 
Through 's first decade and Involved a range of topics which 
Included the specification of the goals and objectives of 
the National Fol low Through Program, the expected outcomes 
of tf^ Program, the best measures to assess those outcomes, 
and the lll-^ated n«tch bett^een the measures used In the 
national FT evaluation and the goals and objectives of 
specific model programs. The design problems we discussed 
were concerned mostly with the non-con^arabl 1 Ity of the 
control groups used in the national evaluation study of FT. 
We were very much preoccupied with the classic question: 
If you get good results, how can you be sure It'& because 
of your program if you don't have adequate confer I son 
grcMJps? 

ftost of. [the discussions] were based on the "givens" cm 
how to evaluate Follow Through at the time. The accepted 
design was based on the treatment-outcome paradigm, and 
the sole purpqse was to coo^are the relative effects of 
the various model programs on outcome measures. For 
several y^ars, we tried to utilize this design to solve 
our evaluation problems. As It turned out, the solutions 



never came, mostly because we were asking the wrong ques- 
tions..." 

Describing how evaluators were sidetracked by taking the 
technological route, she continued, 

those of us vAyo were worrying about the technological 
and methodological issues did not give imich thought to 
such matters as the conditions under which we would have 
to operational ize CMtr evaluation design. We were to- 
tally immersed in the challmge of ccMning up with 
methodologies and designs that c<Hi1d answer Follow 
Through 's basic evaluatfc^ questions, and vte were total ly 
Ignorant of the facts of life of Incrementing and study- 
ing school change. We tried to solve our problems 
through the use of sophisticated statistical nmthods of 
data analysis which were technologically elegant but 
which made our lives even nwre difficult as »« attempted 
to tease out the relevant information needed to discover 
which model programs *#ere more effective. 
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ADVANCING THE FRONTIERS 



Clearly, from the syntheses of the presentations that fol- 
low, the field of evaluation is getting older and wiser. What 
was once thought to be a simple task — that of deciding which 
of several programs is better, why, and how another can be made 
In its image — is now seen as highly problematical. It was evi- 
dent listening to the researchers who spoke at this noting that 
they were among those in the field who had shifted from what Wang 
called the "fidelity perspective" to the "adaptive perspective." 
The problems of evaluation designs described in part by Wang were 
probed and analyzed at this TOeting by "Individuals not only 
recognized for their talent and contributions to advanced re- 
search, but also for their leadership in challenging colleagues 
to make evaluation research useful and relevant to school improve- 
ment," in the words of convener Wang. These presenters were: 

Gene V. Glass, University of Colorado at Boulder. 

Leigh Burstein, University of California at Los Angeles. 

Garry McDaniels, U.S. General Accounting Office, Washington, 



Ernest House, University of Illinois, Urbana. 
ThoBias McNamara, Philadelphia School District. 
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Susan Loucks, The Network Inc., Andover, Mass. 

Chad Ellett, University of Georgia at Athens. 

Walter Haney, The Huron Institute (Cambridge, MA) . 

J. Ward Keesling, System Development Corporation (Santa Monica, 

Dalton Miller- Jones, University of Massachusetts at Amherst. 

Ernest Bernal, Creative Education Enterprises (Austin, TX) . 

Starting off the first day's provocative. presentations. Gene 

G^fss challenged several basic assuniJ)tions of the field: that 

well-planned innovative programs have an appreciable effect; that 
research findings influence educatipnal decisions? and that we 
should be looking for one or two "right programs" for all chil- 
dren and get school people to use them. 

But even these radical challenges to the conventional wisdom 
were exceeded by Ernest House 's disputing the use of scientific 
inquiry itself as the basiTTor social change. 

Having heard such sharp affronts to some basic axi(»ns, the 
conferees took in stride both Leigh Burstein 's contention that 
because children are in a dynamic environment, any conclusions 
reached about one group cannot reliably predict what will happen 
the next time; and Thomas McNamara * s similar perception about the 
effects of the dynamic school climate on staff. McNamara recom- 
n^nded the "judicial method" which would involve staff in a con- 
stant controversy and keep them learning as they weighed the 
"trial" evidence. This presaged Susan Loucks ' description of the 
levels of increasing sc^histication leading to full implementation 
of a program in the classroom. 

Chad p. Ellett contended that at present we can't claim to 
Juiow anything about the results of programs based on their evalua- 
tions, because we still do not have the tools to know if the pro- 
grams are really being implemented. But he did suggest some di- 
rections that we might take to devise these tools. 

Also concerned about the lack of instrumentation was J. Ward 
^geaj-i»g but he focused on "banking" a retrievable group of gen- 
eralizable outcome measures which would serve the hitherto un- 
served Follow-Through programs that were ill-served by standard- 
ized achievement tests. 
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Garry L. McDaniels explained why reports to Congress re- 
quired infonnation that was not always useful to people working 
in the field, why no one investigator should be expected to han- 
dle, the diverse requirements needed for federal administrators. 

Finally, the nuts and bolts uses and misuses of tests were 
discussed by Ernest Bernal , who pointed out the inadequacies of 
the testing system for language-minority students; by Walter 
Haney , who would like to see tests teach, and be used by teach- 
ers who want to zero in on their children's needs; and by Dalton 
Miller -Jones who exposed the logical inconsistencies in the 
present standardized tests, and urged tests that will help us 
understand the cognitive processes of children so they can be 
taught successfully rather than merely being sorted into winners 
and losers. 



11 



i 



8 

Examinin<y the Basis of 6ur Judgements 

USEFUL EVALUATIONS 



"The art of teaching must not be subordi- 
nated to the technology of ma^s testing." 

Gene Glass 



"If it were not that so many people are intimidated by the 
evaluators' methods, the arbitrary authority of evaluatoi-s would 
more quickly be seen as illegitimate. The truth is, we evaln: - 
tors don't know much and %^ don't know how to use what we do 
know," said Gene Glass, from the University of Colorado at 
Boulder. Long regarded as an eaqpert in the uses of research and 
research analysis. Glass has been working during the past few 
years on a project "suramarixing and integrating research findings 
of different educational treatments." 

He has concluded that teachers decide matters of curriculum 
and approach on the basis of complicated understandings, beliefs, 
motives and wishes. Research findings have little influence on 
these decisions, according to Glass, and that's just as well, 
since in his experience, even well-planned innovative programs 
don't have an appreciable effect. 

Glass drove home his point with data about a wide range of 
interventions 8 psychotheroipy, programmed instruction, ^^Jf 
treatment for hyper-activity, treatment of learning disabilities, 
tutoring programs, mainstreaming. Transcendental Meditation, 
behavioral treati^nt of structuring, and perceptual-motor train- 
ing. "What I have found," said Glass, "is that in the majority 
of cases, the variability in any experiment is, on the average, 
twice as great as the improvements the experiments show. Also, 
the mean is only half as great as the variability, and the odds 
un about three to ten that you will find the cheaper control 
-eatment showing better than the innovation being tested. In 
ssence, then, experiments don't have large effects. We can t 
reliably predict that A will be better than B, or B better than 
A." In fact. Glass finds the odds about even that a rew treat- 
ment might be worse than what was going on before. 
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In education, therefore, even if decisions to adopt innova- 
tive programs were actually made on the basis of such things as 
matrix sampling, logistic item models, factor analysis and the 
like, teachers whould not have had a good reason to opt for the 
programs . 

"Since the conditions of the Follow Through programs that 
•were evaluated we'-e frequently not known or were not consistent 
across programs of the same model, anyone who aspired to repli- 
cate the 'successful' programs was boxmd to be disappointed," 
said Glass. 

"It would not have been rational for schools or teachers to 
adapt even the seemingly successful Follow Through programs on 
the basis of existing statistical data. Even if one overlooked the 

fact that we knew very little about the circumstances of the model 
projects, the results of what was tested (which of course did not 
attempt to translate the complex subtle notions of child develop- 

: nt and goals of education into mass tests) did not support the 

: iovations sufficiently." 

The problem isn't one which can be solved by investing large 
amounts of money in synthesizing test results, 

"Teachers don't need and don't use statistical findings of 
experiments when deciding how best to educate children. They do 
want to know whether the method is consistent with their views of 
themselves as professionals, whether the program treats pupils as 
though they were robots, delicate flowers, or children of God." 

To do better in the future. Glass advocates "evaluations 
that emphasize description (principally qualitative ) for informed 
choice. Models should be described in terms that people consider 
personally significant when they choose a particular profession 
for themselves or a school for their children. Technocratic, be- 
havioristic and ant i -democratic language should be avoided. An 
ethnographic or case-study approach to evaluation should be adop- 
ted in place of a quantitative, experimental field trial. What 
teachers need to make informed decisions are: 

• Some coherent, detailed portrayals of life in school 
for pupils, teachers and parents as it is colored and 
shaped by allegiance to a particular Follow Through 
model, 

• Some portrayals by disinterested, expert ethnographers 
with at least two years on-site for data collection 
and. 
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• Some portrayals focused on a broad range of concerns 
including the model's philosophy, its history (since 
its future must be projected), techniques, financial 
and psychic-costs, side-effects and after-effects, 
the roles it requires people to play, its potential 
for a favorable evolution, and the like." 

Concluded Glass: "Our evaluations should not aspire to dis- 
cover the one or two right programs for all children and get 
everyone to follow the prescription. We need evaluations that 
will lead to adoptions by school people, who can make an informed 
choice, based on their goals and philosophies and the nature of 
their districts." 
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SCIENTIFIC AND HUMANISTIC EVALUATIONS 



"For the guidance of future human action, 
one would choose the humanistic study over 
the scientific one." 



Ernest House 



Two children from the same family go to the same school and 
are exposed to the same programs 03istar for reading, ipl for 
math). One is stimulated, the other bored. How can that be? 
Becruse, explained researcher Ernest House, from CI^E, the Uni- 
versity of Illinois and the father of the children, they had dif- 
ferent personalities and different learning styles. Because too, 
the vivacious teacher who had iniciated the program and had taught 
his older child had left the program and the new teacher who taught his younger child 
was not as lively. Also because the prograin itself had lost its 
glowing promise by the time his second child was enrolled. 

These reasons cited by House for why his two children re- 
acted as they did explain in small part why no program, regard- 
less of how specifically delineated, is the same for every child. 
Even more variations occur when programs are in^lemented in dif- 
ferent schools, in different towns with different socio-economic 
groups, by different teachers, etc. Given these variables we 
must question what we accomplish when we collect and quantify 
data across classes, schools, coomiunities. What are the implica- 
tions of this insight for testing and evaluation? According to 
House, they suggest we must critically evaliuite why we do what we 
do and whether we should change our thrust in research. 

According to House, social scientists are on the whole, 
solidly in the tradition of Leonardo, Copernicus, Galileo and New- 
ton — that is, in the "scientific tradition." They are looking 

a nathanatically measurable answer to formulate a universal law 
of reality. 

Modern science has three assumptions: 
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• that every question has one and only one true answer 
cind if one doesn't arrive at the one true answer, one 
has asked the wrong question, for the right one will 
yield the right answer 

• that there is one method for discovering the answer and 
the method is rational in character 

• and that the answers discovered by such a method are 
true universally for all people in all times and that 
truth is not relative in any way. 

House disputes the idea that physical and social reality are 
similar. He believes that the "scientifically" designed findin. i 
of the Follow Through evaluations done to date are inadequate 
"even though elaborate quant itive methods were employed" not sim- 
ply because ♦-.he wrong inethods were employed but because thr basic 
assumption is that scientific inquiry can be employed for this 
purpose, is incorrectZ House would substitute for scientific in- 
quiry a report of the experiences of those involved, even though 
those experiences may be "biased, subjective and undisciplined." 

Citing the writings of Vico, a Benaissance thinker who pro- 
posed that there was no point in behaving as if himian nature is 
unchanging^ House offered a disciplined alternative paradigm to the 
scientific one provided by Galileo. in this alternative view, 
individuals and their actions are seen in terms of their intentions 
and purposes. 

House reocRsnerxied the one Follow Through evaluation which was 
not based on the "scientific method" but was, rather, an excel- 
lent example of the humanistic model of inquiry. The Bank Street 
study, %n:itten by Zimiles and Mayer (1980) is subjective and can 
be accused of being biased, but, in House's opinion it gives far 
more valuable information than the spare scientific studies most 
frequently cited in the literature. So, for all its weaknesses, 
the length being a big one. House recommended that for seme pur« 
poses and in some situations, the humanistic study may be prefer- 
able to the scientific study. 
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INVESTIGATING SOCIAL PFOGRAMS WHEN INDIVIDUALS BELONG 
TO A VARIETY OF GROUPS OVER TIME 



"Researchers must remember that they are 
dealing with dynamic subjects In a chang- 
ing environment when they set about their 
work, because these dynamics limit the 
programs' predictability." 

Leigh Burstein 



How much does the context (the nature of the kids, their 
out-o£-school experiences, the abilities and personality of 
their peer group, etc.) affect the validity of the evaluation? 
According to Leigh Burstein of the University of California, Lo 
Angeles, researchers must be mindful that students and schools 
are in constant transition, and these dynamic properties will 
affect research results. Therefore, we. must know: 

• If the program was actually implen^nted. 

• The adjustment teachers and students had to make. 

• The effect of the reform on the social system of the 
sch<K>l itself Cdid it work at cross purposes or 

* blend smoothly) . 

• The circumstances and nature of the children in- 
volved in the programs. 

• The effects of the ccmiposition, size, ability and 
personality of the group. 

• The different effects different programs had on dif- 
ferent kids, how long they lasted, and if children 
outside the program behaved differently. 

• What kinds of relationships the programs engendered ~ 
i.e., cooperative,, competitive. 
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• The educational achievement (both short and long 
term . 

• The attitudes engendered towards self and schooling; 
initiative, independence, adaptability, etc., the well- 
being that apparently resulted. 

• The possible effects of the shift from one learning 
environment to another (what Burstein calls dis- 
continuity) . 

• Any changes in school attendance, special education 
placement, grade retentions, etc., that resulted. 

• Did the students who participated find it difficult 
to adapt to a new instructional style afterwards — 
so that the discontinuity of experience was seen in 
the long run to be detrimental even though the im- 
mediate result was that the program fostered better 
skills as a result of improved instruction. (In such 
a situation, districts might have to consider keeping 
the same system and teacher throughout at least the 
first three years of schooling as they do in Sweden.) 

, Wrapping up his argument, Burstein asserted that since "pro- 

gram elements are inherently interrelated and their interface, 
linkages, and dependencies are at the heart of a sound under- 
standing of school reform efforts," "better conceptualization, 
design, instrumentation and analyses will improve the process 
only marginally unless refinements are directed towards under- 
standing both program elements and their interrelationships by 
ccnnbining the focus on educational and social processes with mul- 
tiple investigations from diverse perspectives." 
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PUTTING AN INNOVATION "ON TRIAL" 



"The Judicial approach is . . . par- 
ticularly relevant for capturing and 
directing the fluid, evolutionary process 
of Implementation." 

Thomas HcNamara 



How is implementation like seeing clouds? Taking off from 
this provocative analogy, ThcHnas McNamara, who directs Early 
Childhood Evaluation for the Philadelphia Public Schools clari- 
fied some important similarities and differences. ' 

The decision to "intervene in the contiraous, massive move- 
ment of weather systems as they roll across the earth's surface" 
is made only when rain is considered absolutely essential to 
remedy drought conditions, McNamara explained. A canparable im- 
pulse to correct arid educational coi^itions prompts Innovation 
in schools. The hit-or-miss characteristics of the art of 
weather control can be con^ared with the primitive state of the 
art of school intervention. And at present, ability to accu- 
rately predict the effect of educational programs is comparable 
to the state of weather predictions — in both cases the knowl- 
edge-base is slim. 

«4.u ^^^^ ^® analogy ends for, as McNamara pointed out, 

the inanimate elements comprising the complicated web of inter- 
actions found in weather systems are far exceeded by the com- 
plexity of the self -knowing , abstract- thinking beings we have to 
deal with in the educational sphere." HcNamara 's long experience 
in schools has taught him that "complex human changes occur 
against a backdrop of existing human organizations" and cannot be 
reduced to low pressure systems meeting with high humidity condi- 
tions. Rather, the coa^jlexity of people's interactions within 

SS^^L*®?*'^?^! understood in a "non-mechanistic, non- 

reductionist" fr£mework. 
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What then is the most effective way to motivate school peo- 
ple to adopt new ideas and practices and to implement them with 
the commitment needed to affect change? According to McNamara, 
what IS needed is a method that encourages healthy controversy 
fw compromise and resolution. McNamara suggested 

the Judicial Method" which relies on human testimony and enables 
people to develop a clearer understanding of the range of issues. 

Specifically, 

"A trial held wfthin the schoo 1 -commun i ty context, would 
follow (with some modification) procedures of sound jur- 
isprudential practice. There would be a 'judge.' a 
'jury,' 'plaintiff,' and a 'respondent.' Witnesses would 
be called to testify in behalf of a position taken on one 
side or the other of a given Issue. These witnesses 
would be examined and cross-examined as in a court of 
law. Pre-trial investigation would include Interviewing 
a full range of potential witnesses. This Investigation 
would also Include the study and analysis of Important 
documents, test scores, and other conventional assessment 
data to be presented later as exhibits during the public 
proceedings. The entire activity was envisioned as a 
clarification process ultimately leading not to a verdict 
but to a set of recommendations provided by a citizen 
jury. What was to be 'tried' was a range of important 
issues confronting the local school system. The guilt or 
innocence of persons within or without the system was not 
to be the Issue. Indictment of Individuals would serve 
only to subvert the major intention of the process- 
namely, clarification. "2 

"Since the very people affected by the emerging policy will 
be intimately involved in the inquiry process, the judicial meth- 
od assures that the policy decisions are not only responsible but 
responsive to staff concerns," concluded McNamara. 



^olf , Robert, "l^he Way I See It . . . American Education Should 
Go On Trial," (E ducational Leadership , April 1980). 
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• How Shall v?e Know What's Happenincy ? 
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THE CONCERNS BASED ADOPTION MODEL 



Outcome evaluations conducted after one 
year of use are apt to reflect less im- 
pact on students than perhaps even the 
previous year, when the innovation was 
not used." 

Susan F. Loucks, Gene E, Hall 



known 8taaes^ori^fc»fj»Jl*"r^"f =2P''"'i°»"°'' echo the well- 
SJ^age?":^d°KoU^rg"re^c?ive?y."'°"' «^-«l°P-nt described 

LEVELS OF USE OF THE INNOVATION: TYPICAL BEHAVIORS 




LEVEL OF US E 
yi RENEWAL 

V INTEGRATION 

IVB REFINEMENT 

IVA ROUTINE 

III MECHANICAL USE 

II PREPARATION 
I ORIENTATION 

0 NON-USE 

7: 
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The user is seeking more effective alterna- 
tives to the established use of the innova- 
tion. 

The user is making deliberate efforts to 
coordinate with others in using the innova- 
tion. 

Th^ user is making changes to increase out- 
comes. 

The user is making few or no changes and 
has an established pattern of use. 

The user is using the innovation in a 
poorly coordinated manner and is making 
user-oriented changes. 

The user is preparing to use the -innovation. 

The user is seeking out information about 
the innovation. 

No action is being tiiken with respect to 
the innovation. 
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Loucks* and Hall's Concerns Based Adoption Model (CBAM) not 
only helps us understand the extent to which time and a series of 
adjustments in attitudes and skills are involved in change, it 
provides a framework to compare programs. If two models of in- 
novation are competing for an administrators* favor, the one in 
which teachers reach the "refinement level" within two years 
might well be more attractive than the program in which most 
users never progress beyond the mechanical level- The Loucks and 
Hall analysis also suggests that evaluators would be wise to wait 
at least two years in any effort (when the results will reflect 
changed behavior that is more than merely mechanical) , before 
judging a program's success. 

Having a method to calibrate degrees of implementation has 
led Hall and Loucks to some additional discoveries. While it has 
long been thought incontrovertible that the more a program looks 
like its model, the better will be the results, they have found that 
often some degree of adaptation relates to higher outcomes than 
either high fidelity or major adaptations." They have also learned 
that often teachers must participate in the development, de- 
sign and planning of the innovation, if they are to succeed with 



These discoveries can explain a number of implementation 
oddities of the Planned Variation Follow Through experiment, par- 
ticularly the repeated phenomenon (which became apparent at 
earlier NIE planning meetings) of school administrators singing 
the praises of certain "successful" models while at the same 
sites the model-sponsors bemoaned their lailure because they saw 
local teachers deviating from the prescribed practice. 
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HOW CAN WE KNOW IF THE PROGRAM 
IS BEING USED? 



"A first objective in future efforts to 
study program implementation . . . should 
be to provide an empirically based de- 
scription of what the program is and is 
not 

Chad 0. Ellett 



Before a program can be evaluated, it is necessary to mea- 
sure the degree to which it exists . One cannot appraise the ef- 
fectiveness of an approach like Direct Instruction, for example, 
in a classroom in v^ich the materials have been provided, but in 
in^ich the teacher isn't actually using the program. So, the 
first order of business is to set criteria and collect data 
which will reveal the degree to which the program being 
evaluate is actually being conducted in a given classroom or 
school, 

Chad D, Ellett of the University of Georgia, with Margaret 
Wang (University of Pittsburgh), have provided a framework for 
doing this. Briefly, their plan entails defining "critical 
dimensions" and "scaled descriptors" for each aspect of a given 
Follow Through Program. For example, one critical dimension 
would be the classroom teacher's cc»Bmunicating with learners by 
clarifying directions and explanations when pupils misunder- 
stand. To measure whether a teacher was actually implcmienting 
this part of the program, an Observer would apply a "scale of 
scoreable descriptors." On this particular scale the lowest 
rating might be "discourages learners when they seek clarifica- 
tion of directions," while a high scoring teacher would "give 
directions using different words when learners do not understand, 
and attes^t to "identify areas of misunderstanding and restate 
coomunication. " 
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Out of many such components the authors would construct 
comprehensive performance indicators for each critical dimension 
of Follow Through and a generic framework for evaluating program 
inclement a ti on. Such a fraj!!«»work for evaluation would, they con- 
clude, "be 'generic* in tei-rns of the what and the how of imple- 
mentation, but flexible in nature in order to adaptively accommo- 
date the div€irsity of Follow Through models'* (emphasis in origi- 
nal) • 
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RESEARCH NEF' : S£LECTI(»I CONSIDERATIONS, 
AND ALTl VE OUTC(»«E INDICATORS 



*'Ev«n If some agreement can be reached on 
the outcomes of Interest, this does not 
guarantee agreement on the Instruments to 
be used to measure the outcomes." 

J. Ward Keesling and 
Al ten G. Smith 

Whenever professionals in the field of program development 
evalu^itlon write or speak about the Follow Through program, they 
roanplain that only rather mundane measurement tools were uoed to 
isvaluate" the innovative Follow-Through models which brlnBoed with 
interesting and exciting consequences for children, parents, 
paraprofessionals, teachers, schools and cosBminities . In these 
mai*y-faceted programs: 

• Children made leaps in learning, health and fitness, 
initiative, independence, and emotional growth; 

• Parents learned to be firmer and more patient; to 
guide, to teach, to make personal decisions, and to 
collaborate with the schools; 

• Paraprofessionals coped better in the marketplace 
and amassed credentials for new careers; 

• Teachers and administrators developed new respect 
and understanding towards project children and their 
families; teachers' time and space management in 
class inqproved; administrators became more skilled 
in staff relations; 

• The schools involved parents more, made necessary 
curricular changes, inqproved relationships with 
other educational agencies. 



Yet these myriad outccxBies were overlooked because acceptable 
results "funneled down" to a few narrow standardized measures of 
change. 
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J. Ward Keesling of AdvancM Technology, Inc. , and Allen 
G. Smith of System Develoimient Corporation bemoan that waste. 
"Research conducted by the Follow Through sponsors themselves 
on alternative outcomes and measures covers at least 5 linear 
feet of shelf -space," recalled Keesling. "While many of the 
tests and measurements that were invented are too program specif- 
ic and/or too expensive to administer widely, some of them 
could be useful in other programs." 

The trick is to find the useful, inexpsensive, suitable 
material that matches the outcomes of a number of programs 
and make it available) widely. Keesling and Smith'^ sensible 
proposal is "a review process similar to the Joint Dissemination 
Review Panel (JDRP) that would pass on the acceptab^ity of 
the instrument for general use." 
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A FEDERAL ADMINISTRATOR'S PERSPECTIVE 



"Evaluations specifically designed to an- 
swer the questions -sked by federal admin- 
istrators may not help those who want to 
know what happened to the children." 

Garry L. McOaniels 



People planning and implementing strategies to document 
school is^rovement efforts should know how federal programs like 
Follow Through are evaluated especially sinqe they are likely to 
differ substantially from evaluations which ifould be of use to 
teachers, school administrators, and community leaders. 

Garry L. McDaniels from the Institute for Program Evalua- 
tion, the Oniteci States General Accounting Office, provided the 
following useful blueprint showing what questions standard evalu- 
ation of a Federal Program seeks to answer: 

I. Identifying the goals of Congress 

A« Who are the intended beneficiaries? 

B. What services are envisioned for those beneficiaries? 

C. What administrative mechanism did the Congress en- 
vision to provide services? 

p. What positive impacts were expected? (What negative 
is^acts were to be guarded against?) 

ZI. Describing the executive branch's program 

A. Who are the beneficiaries receiving services? 

B. What array of services exist? What is the relative 
frequ<tsncy of services? 
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C. What administrative actions has the executive branch 
taken? What administrative mechanisms are in place? 

D. What impacts appear to be covered by or associated 
with the presence of these services? 

III. Providing an analysis and syntheses of the data collected 

A. Are the intended beneficiaries being served by this 
program? Are they receiving services they might not 
have been otherwise receiving as a result of this 
program? 

B. Are the services being received consistent with those 
envisioned in the Act? 

C. Are the actions taken by the executive branch consis- 
tent with those expected by the Congress (e.g., regu- 
lations, distribution of effort)? 

D. Are the impacts identified related to the services 
provided and are these impacts consistent with the 
intent of the legislation? 

IV. Providing recoimnendations 

A. for the law 

B. for the executive branch 

C. for the local administration of services and/or 
federal funds. 

The immensity and cc^nplexity of this agenda of questions has 
brought McDaniels to the conviction that the job is one for many 
hands. 

"Experience has led me to believe," said McDaniels, "that it 
is physically and intellectually impossible for a single organiza- 
tion to organize and execute a major program evaluation because 
no single organization has a sufficient pool of talent, and tech- 
niques favored by researchers working for a particular organiza- 
txon tend to favor similar techniques of data gathering. As a 
result, when I hear that one RFP has been announced to evaluate a 
given program I feel the policy-maker will be poorly served." 

McDaniels would like federal agencies to use several differ- 
ent contractors — each chosen because they can contribute 
uniquely to an aspect of the program being studied. He would 
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also like to see a number of different methodologies employed — 
each specifically designed for a specific question. But no one 
investigator should be made to feel that his or her study should 
have the goal of clarifying all aspects of a major policy ques- 



Finally, McDaniels feels that a final report should be com- 
missioned to synthesize all the studies and to clearly identify 
the cumulative meaning of findings. The individual reports of 
investigators should not leave out important details for the sake 
of •untechnical" readers . 

-The investigator reports for a federal evaluation should be 
good examples of scientific writing — technically responsible 
and readable," he concluded. 
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Finding the Right Measttres, Inventing Wew Measures 



ASSESSING LANGUAGE-MINORITY STUDENTS 



"Ulspanlcs and other language minority 
groups have become victims of test abuse 
and test misuse." 

Ernest M. Bernal 



"The only ^pxx^ that profits froai the use of aiglisb'based achievranent 
tests oi limited Eiigli^ pcoficiaxt children are the test laaisacs," cental 
Ernest Bernal of Creative Educational Enterprises (Austin, .Texas) . 
He argued that the tests are harmful to the children and of 
little use to educators. Unless they are redesigned, "most of the 
achievement and affective data will be worthless!" 

r 

According to Bernal, the present tests are unreliable, except for 
the short run. When language-minority children are tested in 
English, the tests inadequately assess their aptitudes, attitudes, 
achieveaaent and development. Nor do they predict which students are 
likely to succeed. 

While the test results therefore have been of little prac- 
tical value, they have had considerable negative effect because 
teachers often predict childrens' failure on the basis of tecl 
results and give up cm them. 

Bernal reminded us that because these children (singly, and 
in a group) are different, if their scores are to be included in 
the new Follow Through program evaluations, student variables 
unique to this group will have to be dealt with, i.e.: 

• their competence or lack of it in both English and 
their own language, 

• their general communicative cciQ>etency, 

• their achievement in select subject areas. 
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• their cognitive style, 

• their self-esteem, inter-ethnic attitudes and 
ovm- language attitudes. 

These children also test differently, which makes them hard 
to evaluate, explained Bernal. No one has succeeded in correctly 
interpreting the test scores of students who are taJcing math or 
science achievement tests and are not proficient in the language 
of the test. Sometimes the results are startling — as when 
children show a sudden extraordinary pre-test to posc-test gain 
(which merely means that between the tests, they have learned to 
read). On the other hand, the scores of those who don't learn to 
read get worse as the norm expectations increase in difficulty. 
So when the scores of these two groups are averaged, "Presto! No 
^insl" Bernal asserted that during the evaluation of the first 
Follow Through programs, some evaluators were so stumped that 
they "pulled" their scores so that they wouldn't be included in 
the analysis. 

In addition to student variables, variables specific to ESL 
and bilingual programs inevitably confuse evaluation results. We 
need to know: 

• the language proficiency (oral and written) in both 
languages of teachers and aides. 

• the proportions of instructional time and content in 
English and the non-English language. 

• which of the instructors (the more prestigious teacher 
or the less prestigious aide) conducts instruction in 
English and which in the second language. 

• the type of bilingual or ESL instruction provided. 

"If we do not consider deliberately the relationship of lan- 
guage minority students to the entire new Follow Through effort, 
their presence by design or accident may become a nuisance, a 
"noise* or cacophony which our interventions, instruments and 
methodology are ill-prepared to orchestrate," warned Bernal. 
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EXPANDING THE USES OF TESTS 



"If we vfew standardized tests not sim- 
ply as measurement Instruments but as 
sources of direct learning, then perhaps 
we might develop them In different ways." 

Walter Haney 



For years tests have been used to sort children (I.Q. 
tests) , to uphold educational standards as antidotes to grade 
inflation (Regents azid achieven^nt tests) , and as debating mate- 
rial in a continuing <$iscussion on the main aims of education 
(high school coopetency tests) • But we have not yet found ways 
to construct tests tha^ are terribly helpful as direct aids in 
teaching and learning, according to Halter Haney of ' the Huron 
Instituted. 

Norm-referenced tests are unsuitable for measuring a pro- 
gram's effectiveness because they are constructed to be insen- 
sitive to the effects of instructitm in local sctool systems 
(which all have different curricular) . Nevertheless they are now 
frequently used for this purpose although their results can be mis- 
leading, said Haney. Even the criterion-referenced instruments 
which are designed to measure a programs' effectiveness are not 
sufficiently refined to do this well. "HOrk on criterion- 
referenced measurements seems to be progressing far faster on 
techn.*'*.al issues, such as methods of item-analysis, setting cut- 
off scores, assessing decision consistency, and applying general- 
izabllity theory to analyze variance in test results than on the 
substance and skills of what has or has not been learned.** 

V. 

Haney would like to see more effective tests for program 
evaluation, and a new emphasis in test-making — tests that actually 
teach children, 

Nbuld the primary function of tests — the sorting and pol- 
icy making functions — be violated by developing tests that aid 
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instruction? Haney thinks not, pointing out that the uses of 
tests have changed through the years and will no doubt change 
ag^in. "Not many years ago, educational program evaluation was 
viewed as research in the service of derision-making, but studies 
Since then have shown that findings rarely have contributed di- 
rectly to decision-making in the way that was expected. Now 
program evaluation is seen less as applied science and more as a 
descriptive enterprise, and it is possible that testing as part 
of the evaluative enterprise could be aimed less at formal in- 
ference and selection and more at description. " 

How would tests be developed from which people could learn? 
Haney thinks a reasonable place to begin would be with theories of 
learning such as Benjaiiiin Bloona's theory of mastery learning which 
highlights four elements of "quality instruction": cues, partici- 
oation, reinforcement and feedback. 

If tests were to be designed as learning instruments they 
might provide: 



1. Cues that could be altered or adapted to present 
those which work best for particular learners — 
i.e., written cues for some students, oral cues for 
others . 

2. Opportunities for active participation and practice 
with differences in the amount of practice or parti- 
cipation depending on the individual learning style 
and needs of students. 

3. Reinforcers which would be adapted to the particular 
learner (since what is a reward for one child may 
not be for another) . 

4. Quick and corrective feedback for students, when and 
where needed. 

"When tests are viewed strictly as measurements, alternative 
modes might be viewed as a problem, namely as extraneous sources 
of error variance. But frcan the learning perspective, alterna- 
tive modes might be viewed more positively as differentially ap- 
propriate for students with different learning styles," says 
Haney, 

"Specifically they might: 

• be available in alternative modes of presentation 

ve.g., oral, written and via video screen rather than 
simply written 
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• be labeled in terms familiar to test-takers rather 
than in terms of psychological constructs on be- 
havioral domains (e.g. , word vdaard tests ratha: than vocabulary tests) 

• be self-scoring or scoreable by individual test- 
takers 



• be of variable length 



• provide results not only on whether answers are right 
or wrong but on the nature of errors or sources of 
corrective instruction." 

Haney concluded by suggesting that the role of evaluators 
could be changed from "producing knowledge to give to educators 
for purposes of educational in^rovement" to "providing tools to 
educators and society generally with which to communicate about 
education goals and values and providing instrriments to learn- 
ers to improve learning." 
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ASSESSING ABILITIES OP BLACK CHILDREN 



"Test Items should be designed to elicit 
the, most sophisticated, complex or at 
least the most appropriate cognitive proc- 
esses In these children." 

Datton Miller-Jones 



Alice has been prodded and prompted by adults since birth. 
She has learned to "read" adult questions. Whenever her mother 
or grandmother asked "What kind of fruit do you want, an apple? 
An orange? A banana?" and Alice answered, "A cookie," she was 
gently reminded they said fruit . By three, Alice wasn't making 
that "mistake" any more. She knew what they wanted to hear. She 
learned to say, "I don't want a fruit, I want a cookie." 

Betty's mother worked. When she wanted something to eat, 
she knew whete to find it, and got it for herself. Often her 
older sister siaqply shared v^at she was having without asking. 

So, as Oalton Miller*Jones f rcsi the University of Massachus- 
etts pointed out, while both children know that apples and oran- 
ges are fruits, Alice will always say they are similar because 
they are fruits, Betty may tell you that what is most ig^rtant 
about their similarity to her is that she likes eating^th of 
them. Alice is said to have a higher I.Q. because she knows what 
adults want to hear. 

Charles is blond. When asked what a brunette is, he says 
it's a person with "dark brownish hair." Daryl is black, he 
tells you a brux^tte has "light brownish hair." Although the 
dictionary says brunette is "a reddish oKsderate brown," Charles 
is "right" idiile Daryl is "wrong." Since this is a question from 
an I.O. ^est, Charles' I.Q. mark is higher than Daryl 's. 

These and other such questions used as indicators of "intel- 
ligence" statistically "prove" black children have lower intel- 
lectual ability than white e.hiX^ren. 
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According to Miller- Jones, the logical inconsistencies in 
the standardized I.Q. tests are legion. It is correct to say 
that houses are made of bricks and wood but incorrect to say they 
are made of sticks and nails. It is correct to say windows are 
made of "glass and wood" but incorrect to say they are made of 
"screens and putty. " it is correct to say books are made of pa- 
per, plastic and sc»nething hard for covers but incorrect to say 
they are made of "pictures and pages." 

As Miller- Jones pointed out, "there appears to be no intel- 
lectual distinction (by test makers) between acceptable and unac- 
ceptable responses — and there is no consistency in the criteria 
invoked . . . " Perhaps of more consequence , there is no feedback 
to the child. children who are not trained as was Alice, to know 
from experience what adults want, will answer the first thing that 
comes to mind and assume, because they get no negative feedback, 
that any answer is as good as any other. 

Without this feedback, black children are likely to give un- 
acceptable answers. It is also argued that minority children 
have different cognitive stylets and cultural traditions developed 
as a result of their different environments. These children are 
affectively oriented and use what could be considered relational 
styles n^ile schools typically support and are oriented to analy - 
tic styles. Miller-Jones concurs with Asa Hilliard of Georgia 
State University that unlike Euro-Americans who tend to believe 
that anything can be divided and subdivided into parts and these 
add up to a whole, "Afro-Americans tend to respond to things in 
terms of whole picture instead of its parts; that [they] prefer to 
focus on people and their activities rather than things or ob- 
jects? that [theyj lean toward altruism and social cooperation; 
and that [they] tend not to be 'word* dependent for meaning, rely- 
ing heavily on actual behavior arid experience." 

Citing further evidence of differences in cognitive style 
provided by other researchers. Miller- Jones suggested that these 
children probably need more varied stimul^li for learning and 
more practice in the "accepted" modes of analysis. To assess the 
language and cognitive development of these children, Jones recom- 
mends multiple cognitive and language eliciting materials, cul- 
turally salient subject matter and materials, a familiar comfort- 
able test environment, a variety of tasks relative to inductive 
rather than deductive styles, and a demonstration of what is ex- 
pected through concrete examples given before the test. 

Miller-Jones believes more research is needed in determining 
the relait-ion of conceptual and cognitive styles to school per- 
formance in reading, math and social studies. Equally useful 
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would be diagnostic profiles which probe how students arrive at 
answers. What are we asking children to do that sends some chil- 
^en down blind alleys to pursue unfruitful strategies?" asked 
Miller-Jones. "We don't know now, but we do know that soxoe chil- 

f^^S °" ?° '^f"^ ^^^""^^ ^^^^ ^^^'^ a f^^ile but ultimately 
si^vert their learning process like sticking with first letters 
and context when r*9ading, or memorizing books until their memories 
give out. This is not just an issue of minority assessment,- he 
continued, "TO help all children fulfill their potential we have 
to study the cognitive operations of the children who seem to get 
It right all the time, along with those who persist in getting it 
wrong , " 
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DISCUSdION 



Following the presentations at each of the first three ses- 
sions a lively discussion led by pre-selected experts around the 
^Sir Vr^?""® to clarify their thinking and defend 

discussion is reflected in the synthesis 
of each of the presentations. 

4,4^ IV^^^^ ^® liveliest interchange folloi#ed the presenta- 
tion of ^ the papers of Glass, House, Bursteln and McNanara. There 
was basic agromnt that during the first sixteen years of Polled 
Follow Through great strides had been taken— but the large scale 
eraluatixai studies «4ere disappointing. It was the dimensicms of the 
disappointment and more ia^ortant, the reasons for the disappointment 
that provoked controversy. 

Sorae agreed with Edward Zigler from Yale, qne of the prime 
designers of Follow Through, who felt it was a good ejMriaent 
b^ly executed. Others sided with David Wfeikart fromWgh/Scope 

^""ii^ «ir«igh was its false preaua^tion that 
we could find the one best method of educating all chlWren. 

^ V^J^ ^« ^ planning meeting of Pol- 

this was another round ofthe same 
2^2^!k??I dominated discussion back then between those who 
eougitt the ideal answer for everyone and wanted the whole educa- 
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tional establishment to embrace it once it was found, versus 
those who felt the necessity to keep going back to check the so- 
cial context and what the individuals within it need. 

Ernest House put the argument in an even broader historical 
perspective, tracing back to the Renaissance our mistaken belief 
that the scientific method could provide suitable answers to all 
questions. Be proposed a counter notion: hiamanistic inquiry. 

The discussion did not resolve the philosophical differences 
around the table. But by the time the session was curtailed by 
the demands of the schedule, the sides were clearly drawn, with 
Edward Zigler regretting the "bad science" that created the evaluation 
problems of Follow Through, and Ernest House responding that even 
*f we did it again well, we'd still have a mess because it was 
built upon false presumptions. 

The most comprehensive commentary on the implementation pre- 
sentations was delivered by Convener Wang. Putting the concerns 
of the papers into a larger perspective, she arrived at three ba- 
sic sets of recommendations for NIE and the field's considera- 
tion: 

"1- Models neither have unified I mp I emen ta t i on or e f f ec t s 
across sites, nor do they replicate easily or in sim- 
ilar processes from one site to another. Therefore, 
to continue the pattern of trying to Identify which 
educational approach is best for disadvantaged chil- 
dren, even if program implementation variables are 
included in the evaluation design, is not only un- 
productive but will tend to yield misleading evalua- 
tions. 

2. Implementation of Innovative school improvement pro- 
grams continues to change. The Implementation 
process is affected not only by the nature of the 
Intervention but also by a host of factors that vary 
from situation to situation. Therefore, evaluation 
of innovative programs requires a developmental per- 
spective and dictates the use of a longitudinal 
design with repeated measurement focus on 'improve- 
ments' rather than 'proof,' 

3- The study of Implementation requires an Interactive 
and multifaceted approach using multiple criteria 
and methods of data collection and analysis. Infor- 
mation Is needed not only for use by the consumers 
. of the Innovations to Improve their program Imple- 
mentation but also to further our understanding of 
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the Implementation process. Such Information can 
fad! ;ate the Midespread adoption of Innovations for 
meeting school improvement needs In a variety of 
school contexts." 
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CONCLUSIONS AMD RECOMMEMDATICMIS 



During the fourth session four groups made up of the con- 
ference participants met simultaneously to make suomsary recom- 
mendations to the NIE and to the field at large. Because the 
participants in the discussion groups are distinguished leaders 
in the field » we have listed than by group to give the reader a 
sense of the philosophical and clinical mix of those who con- 
curred in the final reccmomdations. 

Group I: PIRSt THINGS FIRST 

Topics Sui^rting research for the evaluation of NIE- 
fuided pilot Follow Through projects 

Marianne Amerel, Leigh Bur stain, Celestino Fernandez, Ernest 
Bouse, Lawrence Rudner, William Tikunoff, David Weikart. 

Group lis RESEARCH TO IMPROVE THE ART OF RESEARCH 

Tc^ics Research related to awthodological and tech- 
nical developments in program evaluation 

William Cooley, Chad Ellett, Hortense Jones, Tom McNamara. 
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Group III: TOWARDS NEW TESTS AND BETTER TESTS 

Topic: Supporting research on instrumentation and 
development of program implementation and outcome 
TOasures 

Ernesto Bernal, Edmund Gordon, Walter Haney, J. Ward Keesling, 
Susan Loucks. 

Group IV: THE NEXT ORDER OF BUSINESS FOR COMPENSATORY 
EDUCATION — R&D 

Topic: Issues and agenda for research and develop- 
ment in c(^pensatory education 

Freda Holly, Dalton Miller -Jones, Garry McDaniels, Eugene Ramp, 
Margaret Wang, Edward Zigler. 



Group I: FIRST THINGS FIRST 

The group focusing on supporting research needed for the 
evaluation of the new NIE-funded pilot projects made three major 
points regarding the content , the methodology , and the dissemina- 
tion of -evaluation. 

Content ; 

An evaluation report makes a good deal more sense in con- 
text. If it is accompanied by a detailed portrait of the 
school and the community, readers can understand how the lo- 
cal culture and economics shaped the program and what ef- 
fects the program has had on local conditions. Therefore 
it is helpful to collect uniform descriptive data even before 
the program starts. This data can help: 

• reseeurchers and evaluators to understand the results in 
context 

• administrators from outside the district to make knowl- 
edgeable decisions when they consider adoptions 

• administrators frcnn inside the district to compare 
their outcomes with outcomes from districts with simi- 
lar demographic and economic conditions. 
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Methodology ; 

Evaluation modes are needed that can serve a number of 
programs and models. But a distinction must be made between 
evaluation data that c overs a common ground (as described by 
Ellett in his presentation) , and evaluation data that uses 
precisely the same tests to measure the effectiveness of 
models, as was done in the first round of Follow Through 
evaluations with unfortunate results. In that attempt to 
compare the effectiveness of progr«uns, all programs regard- 
less of their intents were submitted to the same tests. 
The process was criticized by a large ninnber of programs 
that i^re judged to have lost a race they hadn't tried to 
enter • 

Dissemination ; 

The language of evaluation must be refined. Evaluations 
should not only communicate to sophisticated administrators 
and other evaluators, but to parents, teachers and lay 
boards of education. 

A final suggestion was not related to evaluation, but to the 
benefits of retaining the sponsor-site structure of Follow 
Through. The groujp urged that the sponsorship of programs has 
worked well in the past and should be continued. Sponsorship 
has: 

• linked local schools with agents outside the school 
who could give them training, support and guidance 

• helped locals through difficult implementation 
problems by drawing on the experience sponsors 
amassed facing similar problems in other locales 

• helped protect services which addressed the needs 
of children when they were threatened by local 
political considerations. 

But although the experienced sponsors have a track record, 
this should not preclude new sponsors from being invited to re- 
spond to the upccaning Requests for Proposals, as long as they too 
are required to specify their models' instructional intent and 
demonstrate prior experience in implementing educational innova- 
tions . 
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Group II; RESEARCH TO IMPROVE THE ART OF RESEARCH 

Needed developments in technology and methodology were the 
focus of one discussion group which proposed that the new NIE- 
funded Follow Through research program should be viewed, in part, 
as a "laboratory" for conducting more detailed studies of how 
evaluation as a process could promote and facilitate school im- 
provement. Computers, which have thus far been under-utilized in 
the field, should be more prominently used. They can be used by 
sponsors and local school districts who want the latest evalua- 
tion information and test designs. As J. Ward Keeslinq pointed 
out in his presentation, many programs could us^ an "item-bank" 
of outcome measures that would be closely enough matched to the 
intent of each program to be meaningful, yet generic enough to 
allow for comparison across programs. Computers could be the 
best way to make these available to the field and keep the in- 
formation current. 

But while new Follow Through projects should advance the art 
and science of evaluation, that should not be the evaluators* 
central focus, the group further urged. Nor should evaluators 
on-site merely document the final success or failure of a proj- 
ect. Rather they should employ their skills to help solve the 
obvious problems facing the district, and even locate subtler 
problems that inhibit program implementation and accurate assess- 
ment. 

For example, tests are not yet perfected that are sensitive 
to minority and low incosne children. Such measures would be 
helpful to local school districts to better evaluate their in- 
dividual school improvement efforts. At the moment they have 
only nationally normed instruments to work with, which sometimes 
mask the great strides they are makinq. 

To be accurate and useful the evaluations should not merely 
measure academic achievement, the group concluded, but should 
address head-on the greater challenge of measuring those more 
ccsnplex effects which were the programs' original concern (also 
see group TV) . The evidence points to the need to balance data 
collection with descriptive narrative, as suggested by Ernest 
House. 
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Group III; TOWARDS NEW TESTS AND BETTER TESTS 

Four approaches were urged to improve the instrumentation 
available: 

• the expansion of what testing can do 

• the use of tests to find more refined ways to reach 
and teach children 

• a consortium to maximize the efforts of the educa- 
tional C(»amunity in developing instrumentation and 
techniques 

• the development of measures of program implementa- 
tion. 

Tests That Teach ; While we need new generalizable outcome 
measures that cover more ground than do standardized achievement 
tests (see groups I and II) , some time and energy should be spent 
devising tests that are educative devices for children — tests 
that teach. And we also need tests that are designed to shed 
light on children's learning styles or skill mastery when "read" 
by teachers trained to use the test data (as described by Haney). 

Test Administration ; Researchers, as Miller -Jones and 
Bernal pointed out, have demonstrated that test directions and 
administration favor children who understand the implicit rules 
and penalize those who don't. Why? One reason is that we do not 
know enough about how to administer tests, or give directions 
that will be fully understood by all children. And we do not 
know what effect the test-taking environment has on some chil- 
dren. Additional research is also needed to better understand 
the many different ways children arrive at answers to tests as a 
possible way to improve their approach to cognitive problems in 
both test and learning situations. 

A Consortium Approach ; Finding the time, money, and exper- 
' tise to improve, refine and invent tests as discussed above would 
be exceedingly difficult for single sponsors or schools to under- 
take. So the group envisioned a consortium of sponsors, academ- 
ics, teachers, parents and local educators that could be managed 
and assisted by the National Institute of Education to develop 
new measurement tools. 

Measuring Imglementation ; Echoing the concerns expressed 
the prior day by fellett and Loucks regarding the need to pin- 
point levels of implementation at sites (see groups II and IV) , 
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this group recommended the development of tools for this purpose. 
Assuming that certain programs are more difficult than others to 
convey to teachers, researchers should explore their cost- 
effectiveness from the point of view of how long it takes before 
innovative programs are actually put into practice. The narra- 
tive-descriptive report-format (described by House) was consid- 
ered to be especially appropriate here. 

One member of the group suggested that perhaps a more semi- 
nal question — whether supporting research ever actually re- 
sults in changes in school practice — would be worth pursuing. 



Group IV; THE NEXT ORDER OF BUSINESS FOR COMPENSATOPY 
EDUCATION — R&D 

The group which set out to develop an agenda for research 
and development in compensatory education spotlighted three areas 
of concern: what happens to children, how to maximize the use- 
fulness of tests, and how to facilitate school change. 

1. The first prior itv should be to learn what happens to 
children when they move from one situation to another 

• from one grade with a distinctive oroaram, to 
the next with a different one 

• from home with one set of expectations, to 
class with another 

• from the ccsnmunitv with one dominant culture, 
to the school with another. 

Researchers have explained how cognitive styles of minority 
children frequently impede them in school. But we know little 
about what problems are created by cultural differences. The new 
£Ound of contracts should measure a broader set of program re- 
sults than did the last Follow Through evaluations which focused 
on academic achievements (a point made also by groups I and II) . 
But to do so adequately will require refinement of techniques to 
assess childrens' progress, describe their program experiences 
and find the means to measure what is accomplished by programs 
that stress process over product. 

2. Current testing programs should be studied to iJ^ove 
and expand the use of tests. Specifically the group 
wants to: 

• maximize information obtained from tests 
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• discover what various test responses say about 
the child and how the insights gained can be 
used to help that child succeed in school 

• close the gap between what teachers teach and 
what tests test. 

Finally, Follow Through should serve as a national 
laboratory for studying schooling in grades one 
through three. The nationwide agenda should include 
investigating deterrents to innovation and 
change in the nation's schools; describing how pro- 
gram implementation is accomplished in terms school 
people can cc^prehend (a point made also by groups II 
and III) ; monitoring the utility and accessibility of 
the literature on innovation; and identifying the 
most premising aspects of the partnership between 
parents and teachers. 
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AFTERWORD 



Transcending the specific recoagaendations and conclusions 
reported above # the conference generated a spirit of conmitment 
be,st expressed by convener Margaret flang: 

"Research designed to produce useful Information for 
school ImprovoMent Is no longer Just an , Idea I . It Is 
beconing a reality, largely through the kinds of capa- 
bilities and technological advancaments In programs 
evaluation discussed here. But continued progress de- 
pends on scholarly advances In research aleed at 
building our knowledge of what Is being Imp lamented In 
our schools and how. Supporting such would be a fruit- 
ful Investment of further public funds . . 
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